{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "This is a companion notebook for the book [Deep Learning with Python, Second Edition](https://www.manning.com/books/deep-learning-with-python-second-edition?a_aid=keras&a_bid=76564dff). For readability, it only contains runnable code blocks and section titles, and omits everything else in the book: text paragraphs, figures, and pseudocode.\n\n**If you want to be able to follow what's going on, I recommend reading the notebook side by side with your copy of the book.**\n\nThis notebook was generated for TensorFlow 2.6."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "# Getting started with neural networks: Classification and regression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Classifying movie reviews: A binary classification example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### The IMDB dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Loading the IMDB dataset**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "from tensorflow.keras.datasets import imdb\n",
    "(train_data, train_labels), (test_data, test_labels) = imdb.load_data(\n",
    "    num_words=10000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "train_data[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "train_labels[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "max([max(sequence) for sequence in train_data])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Decoding reviews back to text**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "word_index = imdb.get_word_index()\n",
    "reverse_word_index = dict(\n",
    "    [(value, key) for (key, value) in word_index.items()])\n",
    "decoded_review = \" \".join(\n",
    "    [reverse_word_index.get(i - 3, \"?\") for i in train_data[0]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### Preparing the data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Encoding the integer sequences via multi-hot encoding**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "def vectorize_sequences(sequences, dimension=10000):\n",
    "    results = np.zeros((len(sequences), dimension))\n",
    "    for i, sequence in enumerate(sequences):\n",
    "        for j in sequence:\n",
    "            results[i, j] = 1.\n",
    "    return results\n",
    "x_train = vectorize_sequences(train_data)\n",
    "x_test = vectorize_sequences(test_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "x_train[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "y_train = np.asarray(train_labels).astype(\"float32\")\n",
    "y_test = np.asarray(test_labels).astype(\"float32\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### Building your model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Model definition**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "from tensorflow import keras\n",
    "from tensorflow.keras import layers\n",
    "\n",
    "model = keras.Sequential([\n",
    "    layers.Dense(16, activation=\"relu\"),\n",
    "    layers.Dense(16, activation=\"relu\"),\n",
    "    layers.Dense(1, activation=\"sigmoid\")\n",
    "])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Compiling the model**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "model.compile(optimizer=\"rmsprop\",\n",
    "              loss=\"binary_crossentropy\",\n",
    "              metrics=[\"accuracy\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### Validating your approach"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Setting aside a validation set**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "x_val = x_train[:10000]\n",
    "partial_x_train = x_train[10000:]\n",
    "y_val = y_train[:10000]\n",
    "partial_y_train = y_train[10000:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Training your model**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "history = model.fit(partial_x_train,\n",
    "                    partial_y_train,\n",
    "                    epochs=20,\n",
    "                    batch_size=512,\n",
    "                    validation_data=(x_val, y_val))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "history_dict = history.history\n",
    "history_dict.keys()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Plotting the training and validation loss**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "history_dict = history.history\n",
    "loss_values = history_dict[\"loss\"]\n",
    "val_loss_values = history_dict[\"val_loss\"]\n",
    "epochs = range(1, len(loss_values) + 1)\n",
    "plt.plot(epochs, loss_values, \"bo\", label=\"Training loss\")\n",
    "plt.plot(epochs, val_loss_values, \"b\", label=\"Validation loss\")\n",
    "plt.title(\"Training and validation loss\")\n",
    "plt.xlabel(\"Epochs\")\n",
    "plt.ylabel(\"Loss\")\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Plotting the training and validation accuracy**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "plt.clf()\n",
    "acc = history_dict[\"accuracy\"]\n",
    "val_acc = history_dict[\"val_accuracy\"]\n",
    "plt.plot(epochs, acc, \"bo\", label=\"Training acc\")\n",
    "plt.plot(epochs, val_acc, \"b\", label=\"Validation acc\")\n",
    "plt.title(\"Training and validation accuracy\")\n",
    "plt.xlabel(\"Epochs\")\n",
    "plt.ylabel(\"Accuracy\")\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Retraining a model from scratch**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "model = keras.Sequential([\n",
    "    layers.Dense(16, activation=\"relu\"),\n",
    "    layers.Dense(16, activation=\"relu\"),\n",
    "    layers.Dense(1, activation=\"sigmoid\")\n",
    "])\n",
    "model.compile(optimizer=\"rmsprop\",\n",
    "              loss=\"binary_crossentropy\",\n",
    "              metrics=[\"accuracy\"])\n",
    "model.fit(x_train, y_train, epochs=4, batch_size=512)\n",
    "results = model.evaluate(x_test, y_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### Using a trained model to generate predictions on new data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "model.predict(x_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### Further experiments"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### Wrapping up"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Classifying newswires: A multiclass classification example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### The Reuters dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Loading the Reuters dataset**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "from tensorflow.keras.datasets import reuters\n",
    "(train_data, train_labels), (test_data, test_labels) = reuters.load_data(\n",
    "    num_words=10000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "len(train_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "len(test_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "train_data[10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Decoding newswires back to text**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "word_index = reuters.get_word_index()\n",
    "reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])\n",
    "decoded_newswire = \" \".join([reverse_word_index.get(i - 3, \"?\") for i in\n",
    "    train_data[0]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "train_labels[10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### Preparing the data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Encoding the input data**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "x_train = vectorize_sequences(train_data)\n",
    "x_test = vectorize_sequences(test_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Encoding the labels**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "def to_one_hot(labels, dimension=46):\n",
    "    results = np.zeros((len(labels), dimension))\n",
    "    for i, label in enumerate(labels):\n",
    "        results[i, label] = 1.\n",
    "    return results\n",
    "y_train = to_one_hot(train_labels)\n",
    "y_test = to_one_hot(test_labels)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "from tensorflow.keras.utils import to_categorical\n",
    "y_train = to_categorical(train_labels)\n",
    "y_test = to_categorical(test_labels)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### Building your model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Model definition**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "model = keras.Sequential([\n",
    "    layers.Dense(64, activation=\"relu\"),\n",
    "    layers.Dense(64, activation=\"relu\"),\n",
    "    layers.Dense(46, activation=\"softmax\")\n",
    "])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Compiling the model**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "model.compile(optimizer=\"rmsprop\",\n",
    "              loss=\"categorical_crossentropy\",\n",
    "              metrics=[\"accuracy\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### Validating your approach"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Setting aside a validation set**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "x_val = x_train[:1000]\n",
    "partial_x_train = x_train[1000:]\n",
    "y_val = y_train[:1000]\n",
    "partial_y_train = y_train[1000:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Training the model**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "history = model.fit(partial_x_train,\n",
    "                    partial_y_train,\n",
    "                    epochs=20,\n",
    "                    batch_size=512,\n",
    "                    validation_data=(x_val, y_val))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Plotting the training and validation loss**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "loss = history.history[\"loss\"]\n",
    "val_loss = history.history[\"val_loss\"]\n",
    "epochs = range(1, len(loss) + 1)\n",
    "plt.plot(epochs, loss, \"bo\", label=\"Training loss\")\n",
    "plt.plot(epochs, val_loss, \"b\", label=\"Validation loss\")\n",
    "plt.title(\"Training and validation loss\")\n",
    "plt.xlabel(\"Epochs\")\n",
    "plt.ylabel(\"Loss\")\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Plotting the training and validation accuracy**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "plt.clf()\n",
    "acc = history.history[\"accuracy\"]\n",
    "val_acc = history.history[\"val_accuracy\"]\n",
    "plt.plot(epochs, acc, \"bo\", label=\"Training accuracy\")\n",
    "plt.plot(epochs, val_acc, \"b\", label=\"Validation accuracy\")\n",
    "plt.title(\"Training and validation accuracy\")\n",
    "plt.xlabel(\"Epochs\")\n",
    "plt.ylabel(\"Accuracy\")\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Retraining a model from scratch**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "model = keras.Sequential([\n",
    "  layers.Dense(64, activation=\"relu\"),\n",
    "  layers.Dense(64, activation=\"relu\"),\n",
    "  layers.Dense(46, activation=\"softmax\")\n",
    "])\n",
    "model.compile(optimizer=\"rmsprop\",\n",
    "              loss=\"categorical_crossentropy\",\n",
    "              metrics=[\"accuracy\"])\n",
    "model.fit(x_train,\n",
    "          y_train,\n",
    "          epochs=9,\n",
    "          batch_size=512)\n",
    "results = model.evaluate(x_test, y_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "import copy\n",
    "test_labels_copy = copy.copy(test_labels)\n",
    "np.random.shuffle(test_labels_copy)\n",
    "hits_array = np.array(test_labels) == np.array(test_labels_copy)\n",
    "hits_array.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### Generating predictions on new data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "predictions = model.predict(x_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "predictions[0].shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "np.sum(predictions[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "np.argmax(predictions[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### A different way to handle the labels and the loss"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "y_train = np.array(train_labels)\n",
    "y_test = np.array(test_labels)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "model.compile(optimizer=\"rmsprop\",\n",
    "              loss=\"sparse_categorical_crossentropy\",\n",
    "              metrics=[\"accuracy\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### The importance of having sufficiently large intermediate layers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**A model with an information bottleneck**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "model = keras.Sequential([\n",
    "    layers.Dense(64, activation=\"relu\"),\n",
    "    layers.Dense(4, activation=\"relu\"),\n",
    "    layers.Dense(46, activation=\"softmax\")\n",
    "])\n",
    "model.compile(optimizer=\"rmsprop\",\n",
    "              loss=\"categorical_crossentropy\",\n",
    "              metrics=[\"accuracy\"])\n",
    "model.fit(partial_x_train,\n",
    "          partial_y_train,\n",
    "          epochs=20,\n",
    "          batch_size=128,\n",
    "          validation_data=(x_val, y_val))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### Further experiments"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### Wrapping up"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Predicting house prices: A regression example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### The Boston Housing Price dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Loading the Boston housing dataset**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "from tensorflow.keras.datasets import boston_housing\n",
    "(train_data, train_targets), (test_data, test_targets) = boston_housing.load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "train_data.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "test_data.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "train_targets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### Preparing the data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Normalizing the data**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "mean = train_data.mean(axis=0)\n",
    "train_data -= mean\n",
    "std = train_data.std(axis=0)\n",
    "train_data /= std\n",
    "test_data -= mean\n",
    "test_data /= std"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### Building your model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Model definition**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "def build_model():\n",
    "    model = keras.Sequential([\n",
    "        layers.Dense(64, activation=\"relu\"),\n",
    "        layers.Dense(64, activation=\"relu\"),\n",
    "        layers.Dense(1)\n",
    "    ])\n",
    "    model.compile(optimizer=\"rmsprop\", loss=\"mse\", metrics=[\"mae\"])\n",
    "    return model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### Validating your approach using K-fold validation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**K-fold validation**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "k = 4\n",
    "num_val_samples = len(train_data) // k\n",
    "num_epochs = 100\n",
    "all_scores = []\n",
    "for i in range(k):\n",
    "    print(f\"Processing fold #{i}\")\n",
    "    val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]\n",
    "    val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples]\n",
    "    partial_train_data = np.concatenate(\n",
    "        [train_data[:i * num_val_samples],\n",
    "         train_data[(i + 1) * num_val_samples:]],\n",
    "        axis=0)\n",
    "    partial_train_targets = np.concatenate(\n",
    "        [train_targets[:i * num_val_samples],\n",
    "         train_targets[(i + 1) * num_val_samples:]],\n",
    "        axis=0)\n",
    "    model = build_model()\n",
    "    model.fit(partial_train_data, partial_train_targets,\n",
    "              epochs=num_epochs, batch_size=16, verbose=0)\n",
    "    val_mse, val_mae = model.evaluate(val_data, val_targets, verbose=0)\n",
    "    all_scores.append(val_mae)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "all_scores"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "np.mean(all_scores)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Saving the validation logs at each fold**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "num_epochs = 500\n",
    "all_mae_histories = []\n",
    "for i in range(k):\n",
    "    print(f\"Processing fold #{i}\")\n",
    "    val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]\n",
    "    val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples]\n",
    "    partial_train_data = np.concatenate(\n",
    "        [train_data[:i * num_val_samples],\n",
    "         train_data[(i + 1) * num_val_samples:]],\n",
    "        axis=0)\n",
    "    partial_train_targets = np.concatenate(\n",
    "        [train_targets[:i * num_val_samples],\n",
    "         train_targets[(i + 1) * num_val_samples:]],\n",
    "        axis=0)\n",
    "    model = build_model()\n",
    "    history = model.fit(partial_train_data, partial_train_targets,\n",
    "                        validation_data=(val_data, val_targets),\n",
    "                        epochs=num_epochs, batch_size=16, verbose=0)\n",
    "    mae_history = history.history[\"val_mae\"]\n",
    "    all_mae_histories.append(mae_history)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Building the history of successive mean K-fold validation scores**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "average_mae_history = [\n",
    "    np.mean([x[i] for x in all_mae_histories]) for i in range(num_epochs)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Plotting validation scores**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "plt.plot(range(1, len(average_mae_history) + 1), average_mae_history)\n",
    "plt.xlabel(\"Epochs\")\n",
    "plt.ylabel(\"Validation MAE\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Plotting validation scores, excluding the first 10 data points**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "truncated_mae_history = average_mae_history[10:]\n",
    "plt.plot(range(1, len(truncated_mae_history) + 1), truncated_mae_history)\n",
    "plt.xlabel(\"Epochs\")\n",
    "plt.ylabel(\"Validation MAE\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Training the final model**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "model = build_model()\n",
    "model.fit(train_data, train_targets,\n",
    "          epochs=130, batch_size=16, verbose=0)\n",
    "test_mse_score, test_mae_score = model.evaluate(test_data, test_targets)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "test_mae_score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### Generating predictions on new data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "predictions = model.predict(test_data)\n",
    "predictions[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### Wrapping up"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Summary"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "chapter04_getting-started-with-neural-networks.i",
   "private_outputs": false,
   "provenance": [],
   "toc_visible": true
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}